9 research outputs found

    Characterizing and Improving Stability in Neural Style Transfer

    Get PDF
    Recent progress in style transfer on images has focused on improving the quality of stylized images and speed of methods. However, real-time methods are highly unstable resulting in visible flickering when applied to videos. In this work we characterize the instability of these methods by examining the solution set of the style transfer objective. We show that the trace of the Gram matrix representing style is inversely related to the stability of the method. Then, we present a recurrent convolutional network for real-time video style transfer which incorporates a temporal consistency loss and overcomes the instability of prior methods. Our networks can be applied at any resolution, do not re- quire optical flow at test time, and produce high quality, temporally consistent stylized videos in real-time

    WiForceSticker: Batteryless, Thin Sticker-like Flexible Force Sensor

    Full text link
    Any two objects in contact with each other exert a force that could be simply due to gravity or mechanical contact, such as a robotic arm gripping an object or even the contact between two bones at our knee joints. The ability to naturally measure and monitor these contact forces allows a plethora of applications from warehouse management (detect faulty packages based on weights) to robotics (making a robotic arms' grip as sensitive as human skin) and healthcare (knee-implants). It is challenging to design a ubiquitous force sensor that can be used naturally for all these applications. First, the sensor should be small enough to fit in narrow spaces. Next, we don't want to lay cumbersome cables to read the force values from the sensors. Finally, we need to have a battery-free design to meet the in-vivo applications. We develop WiForceSticker, a wireless, battery-free, sticker-like force sensor that can be ubiquitously deployed on any surface, such as all warehouse packages, robotic arms, and knee joints. WiForceSticker first designs a tiny 44~mm~×\times~22~mm~×\times~0.40.4~mm capacitative sensor design equipped with a 1010~mm~×\times~1010~mm antenna designed on a flexible PCB substrate. Secondly, it introduces a new mechanism to transduce the force information on ambient RF radiations that can be read by a remotely located reader wirelessly without requiring any battery or active components at the force sensor, by interfacing the sensors with COTS RFID systems. The sensor can detect forces in the range of 00-66~N with sensing accuracy of <0.5<0.5~N across multiple testing environments and evaluated with over 10,00010,000 varying force level presses on the sensor. We also showcase two application case studies with our designed sensors, weighing warehouse packages and sensing forces applied by bone joints

    VIMA: General Robot Manipulation with Multimodal Prompts

    Full text link
    Prompt-based learning has emerged as a successful paradigm in natural language processing, where a single general-purpose language model can be instructed to perform any task specified by input prompts. Yet task specification in robotics comes in various forms, such as imitating one-shot demonstrations, following language instructions, and reaching visual goals. They are often considered different tasks and tackled by specialized models. We show that a wide spectrum of robot manipulation tasks can be expressed with multimodal prompts, interleaving textual and visual tokens. Accordingly, we develop a new simulation benchmark that consists of thousands of procedurally-generated tabletop tasks with multimodal prompts, 600K+ expert trajectories for imitation learning, and a four-level evaluation protocol for systematic generalization. We design a transformer-based robot agent, VIMA, that processes these prompts and outputs motor actions autoregressively. VIMA features a recipe that achieves strong model scalability and data efficiency. It outperforms alternative designs in the hardest zero-shot generalization setting by up to 2.9×2.9\times task success rate given the same training data. With 10×10\times less training data, VIMA still performs 2.7×2.7\times better than the best competing variant. Code and video demos are available at https://vimalabs.github.io/Comment: ICML 2023 Camera-ready version. Project website: https://vimalabs.github.io

    Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks

    No full text
    Understanding human motion behavior is critical for autonomous moving platforms (like self-driving cars and social robots) if they are to navigate human-centric environments. This is challenging because human motion is inherently multimodal: given a history of human motion paths, there are many socially plausible ways that people could move in the future. We tackle this problem by combining tools from sequence prediction and generative adversarial networks: a recurrent sequence-to-sequence model observes motion histories and predicts future behavior, using a novel pooling mechanism to aggregate information across people. We predict socially plausible futures by training adversarially against a recurrent discriminator, and encourage diverse predictions with a novel variety loss. Through experiments on several datasets we demonstrate that our approach outperforms prior work in terms of accuracy, variety, collision avoidance, and computational complexity

    RoboCat: A Self-Improving Foundation Agent for Robotic Manipulation

    Full text link
    The ability to leverage heterogeneous robotic experience from different robots and tasks to quickly master novel skills and embodiments has the potential to transform robot learning. Inspired by recent advances in foundation models for vision and language, we propose a foundation agent for robotic manipulation. This agent, named RoboCat, is a visual goal-conditioned decision transformer capable of consuming multi-embodiment action-labelled visual experience. This data spans a large repertoire of motor control skills from simulated and real robotic arms with varying sets of observations and actions. With RoboCat, we demonstrate the ability to generalise to new tasks and robots, both zero-shot as well as through adaptation using only 100--1000 examples for the target task. We also show how a trained model itself can be used to generate data for subsequent training iterations, thus providing a basic building block for an autonomous improvement loop. We investigate the agent's capabilities, with large-scale evaluations both in simulation and on three different real robot embodiments. We find that as we grow and diversify its training data, RoboCat not only shows signs of cross-task transfer, but also becomes more efficient at adapting to new tasks
    corecore